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Abstract. Since their introduction, Boolean networks have been traditionally 
studied in view of their rich dynamical behavior under different update protocols 
and for their qualitative analogy with cell regulatory networks. More recently, tools 
borrowed from statistical physics of disordered systems and from computer science have 
provided a more complete characterization of their equilibrium behavior. However, the 
largest part of the results have been obtained in the thermodynamic limit, which is 
often far from being reached when dealing with realistic instances of the problem. The 
numerical analysis presented here aims at comparing - for a specific family of models - 
the outcomes given by the heuristic belief propagation algorithm with those given by 
exhaustive enumeration. In the second part of the paper some analytical considerations 
on the validity of the annealed approximation are discussed. 
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1. Introduction 

Boolean Networks (BNs) are dynamical models originally introduced by S. Kauffman 
in the late 60s p]. Since Kauffman's seminal work, they have been used as abstract 
modeling schemes in many different fields, including cell differentiation, immune 
response, evolution, and gene-regulatory networks (for an introductory review see [2] 
and references therein). In recent days, BNs have received a renewed attention as 
a powerful scheme of data analysis and modeling of high-throughput genomics and 
proteomics experiments [3]. 

The bottom-line of previous research has been the description and classification of 
different attractor types present in BNs under deterministic parallel update dynamics 
[H HI El [6]. Special attention was dedicated to so-called critical BNs [7] situated at 
the transition between ordered and chaotic regimes. Recently, a new point of view has 
been introduced that casts the original dynamical problem into a constraint satisfaction 
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problem, that can be studied with theoretical and algorithmic tools inspired from the 
statistical mechanics of disordered systems [H [9] . 

Following this new approach presented in [TO], EEL] it is possible to study the 
organization of fixed points in the thermodynamic limit in random Boolean networks. 
This leads to the identification of the sudden emergence of a computational core, whose 
existence is a necessary (but not sufficient) condition for a globally complex phase where 
all fixed points are organized in an exponential number of macroscopically separated 
clusters. This phenomenon is found to be robust with respect to the choice of the 
Boolean functions, and missing only in networks where all boolean functions are of 
AND or OR type. In addition, the size of the complex regulatory phase is found to grow 
with the number K of inputs to the Boolean functions. 

The main motivation for this work is to check the robustness of the above-reported 
results in networks of finite size, both from the theoretical and algorithmic point of view. 
We aim at understanding how reliable the predictions of message passing algorithms such 
as Belief Propagation (BP) are on graphs of finite-size. 

We performed an extensive numerical check on the number of solutions in small 
samples, comparing BP predictions to exact exhaustive enumeration algorithms results. 
Results of BP on directed BNs are found to be in very good agreement with exact 
ones when counting the expected number of fixed points. A worse performance of BP is 
found in its prediction of local quantities such as the single nodes marginal probabilities. 
Nevertheless, the performance seems to increase with system sizes for already moderate 
size values. 

In section [2] we define the model, in section [3] we describe the Belief Propagation 
algorithm and the strategy we adopted to obtain exhaustively all solutions of our 
BN's model. In section H] we quantify numerically the agreement between BP and 
the exhaustive search results. Furthermore, a brief analysis on the distribution of the 
overlap of the solutions is presented. Finally in section [5] we present the analytical 
computation of the first and second moment of the average number of solutions of 
our model of BNs. Numerical solutions of the BP equations strongly suggest that the 
annealed calculation of the entropy (logarithm of the number of BN fixed points) is 
exact in the large size random case. Fluctuations over the annealed value are computed 
numerically and successfully confronted with analytical estimates of the second moment 
of the number of fixed points. The large size limiting value of the fluctuations around 
the annealed result is also computed, and some considerations are discussed. 

2. Definition of the model 

In the following we will consider a BN of A" interacting variables Xj G {0, 1} with 
j G {1, . . . , A^}. In general, each variable can be regulated by K other parent variables, 
and can enter in the regulation of an arbitrary number of child variables. We then 
consider M Boolean functions, represented by F a with a G {1, . . . , M}, depending on 
K = 2 inputs and having a single output. The truth value of a given output variable 
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x a is then fixed by the truth values of the regulating variables x ai , x a2 via the relation: 

•Ea Pa {pCai j -^ai ) (1) 

with o6ic{l,..., N} running over all regulated genes. As shown in Fig. [2} not all 
variables need to be controlled by a Boolean function, i.e. in general we have \A\ = M 
with < M < N. On the other hand, each regulated variable is the output of one and 
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Figure 1. Factor graph representation of a small Boolean network: circles symbolize 
the variables, squares the Boolean functions. Arrows stress the directed nature of the 
graph. Variables xi,X2,x?,xs are external inputs (non-regulated variables). 



only one function. The whole set of M = aN Boolean constraints completely specifies 
the network topology. We aim at computing J\f BO i, i.e. the number of stationary points of 
the network. We introduce a Hamiltonian, that it is equal to the number of unsatisfied 
Boolean constraints: 

"H ^ ^ X a © F a {x ai , X a2 ) = ^ ^ E a {Xai X ai , X a2 ) • (2) 

where the symbol © stands for the logical operation XOR. 

We will consider random factor graphs characterized (among all possible random 
graphs) by: 

(a) Function nodes F a have fixed in-degree K and out-degree one. 

(b) Variables x a have in-degree at most one. This means that all regulating variables 
are collected in one single constraint F a (see Eq. (CQ)). 

Setting a := M/N, the degree distribution of variable nodes approaches asymptotically 
P° ut Kut) = e- 2a ^^ 

"out- 

P m (d in ) = aS dintl + (1 - a)5d in ,o (3) 
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We now have to specify the functions in the factor nodes present on the random 
factor graph defined so far, i.e we have to specify not only the topology of the graph, 
but also the content. There are 2 22 = 16 possible Boolean functions with 2 inputs and 
1 output. Following [12], we group them into 4 classes: 

• The two constant functions equal to and 1. 

• Four functions depending only on one of the two inputs, i.e. X\, x±, X2, x~2- 

• AND-OR class: There are eight functions, which are given by the logical AND or 
OR of the two input variables, or of their negations. These functions are canalizing. 
If, e.g., in the case F(xi,x 2 ) = X\ A x 2 the value of X\ is set to zero, the output 
is fixed to zero independently of the value of x<i- It is said that X\ is a canalizing 
variable of F with the canalizing value zero. 

• XOR class: The last two functions are the XOR of the two inputs, and its negation. 
These two functions are not canalizing, whatever input is changed, the output 
changes too. 

For the sake of clarity we concentrate only on classes depending effectively on two 
inputs, i.e. on those in the AND-OR class and the XOR class. As we will see in the 
following sections, the average number of fixed points does not depend on the relative 
appearance of the functions within each class, but only on the relative appearance of the 
classes. We therefore require XM functions to be in the XOR class, and the remaining 
(1 — X)M functions to be of the AND-OR type, with < X < 1 being a free model 
parameter. In this simple case the network ensemble is then completely defined by a 
and X. 

3. Methods: Belief-Propagation vs. Exact Enumeration 

Belief propagation is a marginalization algorithm used with success in many different 
fields. It is exact on a tree, i.e. on a graph with no loops, but it is also known to give 
good estimates of the marginals in other cases. 

In this section we will first introduce the BP algorithm, and then we will explain 
the strategy adopted for the exhaustive search. 

3.1. The BP algorithm 

The BP algorithm [TBI [T4l [15] is an iterative strategy for computing the fraction of 
zero-energy configurations having, say, Xi — 0. More generally, BP calculates marginal 
probability distributions of solutions of problems defined on factor graphs. A more 
detailed introduction on the BP equations in the case of BNs can be found in [TT] . 

Let us define a one of the clauses where appears. One can introduce the following 
quantities: 

• /ij_ >a (xj): the probability that variable % takes value Xi in the absence of clause a. 
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function proportional to the probability that clause a is satisfied 
when variable i takes value cCj. 

The above-defined quantities satisfy the following set of equations: 
m a ^i(xi) = [1 - F a ({xi}i ea )] ] [ Hb^i(xi) 

{x;}Ga\i b£a\i 
Hi^a(Xi) = Ci^ a ] [ m b ^i{Xi) , (4) 

where a\z represents the set of all variable indexes belonging to function a except variable 
index i, i\a the set of all function indexes to which variable Xj belongs, except index a, 
and Cj_> a is a constant enforcing the normalization of the probability distribution /Xj_> a . 
This set of equations can be numerically iterated up to their fixed point, if present. It 
turns out [H] that, in the case of our model, the iterative procedure always converges. 
Marginals can be computed in terms of the messages in Eq. (jlj) via the following set of 
relations: 

P a {{Zi}iea) = C a [1 - E a ({Xi} iea )] Yl^a(Xi) 

Pi(xi) = CiY[rn a ^i , (5) 

where c a and q are again normalization constants. From the marginals we can 
eventually compute the entropy, i.e. the logarithm of the number (Af p ) of zero energy 
configurations: 

S= logCMp) = -^Pa{{xi}i ea )^Pa{{xi}iea)-^{di-l)Pi{xi)]nP i (x i ) .(6) 

aeA i£X 

3.2. Exhaustive search 

No known polynomial algorithm is generally able to exhaustively find all the solutions 
of a Boolean constraint satisfaction problem like this one. There is however a number 
of efficient implementations of exhaustive search strategies - still exponential in the 
running times - that allow to explore problems of reasonable size. 

In our case we have mapped the random BN instances onto conjunctive normal form 
(CNF) formulas. Such instances are made of several clauses put in AND disjunction, 
each clause being made of literals in OR conjunction. This is the natural form of the 
well-known satisfiability problem. The mapping is done writing in the CNF formula all 
the configurations which violate a clause in the RBN instance, and negating the literals 
for the true variables. As an example, it is simple to verify that a AND node involving 
Xi, Xi e x 3 as: 

X\ © (x 2 A x 3 ) , 
can be written in CNF in the following way: 

(xi V x~2 V X3) A (x~i V x 2 V x 3 ) A V x 2 V x^) A (aTf V x~2 V x 3 ) . 
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Once the problem is cast in this form, we can exploit the vast number of very 
well performing algorithms existing for solving satisfiability instances [16]. We are 
interested now in exhaustive search programs, among which we choose relsat 2.00 
[TT] . This award-winning program can perform a complete search of the solution space 
for instances up to ~ 500 variables (in our model) in accessible time using a common 
pc. 

4. Numerical Results 

In this section we will check the predictive precision of the BP algorithm measuring the 
entropy and the magnetization vector. 

4-1. Entropy 

In [11] it has been pointed out using an heuristic argument that, for this model of random 
BNs, the average number of solutions is always equal to 2 N ~ AI i.e. to the number of all 
possible external input configurations (note that N — M is exactly the number of non- 
regulated sites). A direct numerical check of Eq. (Tj0) on single samples shows that the BP 
predictions are always in agreement with the above-mentioned heuristic prediction, so 
that for any sample, the entropy density s = S/N = (N — M)/7Vlog(2) = (1 — a) log(2), 
independently from the sample realization and the percentage X of XOR functions, apart 
from terms that vanish when N — > oo. In Fig. ([2]) we display the frequency distribution 
of In J\f so i for 10000 samples at different values of a and X. Comparing these histograms 
one can observe that 

• The most probable value of P(s = ]n(J\f so i) / N) depends strongly on a and only 
mildly on X. 

• The distributions at increasing N seem to peak around the value s = (1 — a) In 2. 
4-2. Magnetization 

So far we have analyzed the behavior of the entropy alone. Although the entropy is 
very well predicted by BP, this is not always the case of the single variables marginal 
probabilities Pi(xi) These quantities are of extreme importance in any decimation 
procedure that is expected to find a specific fixed point via BP. Necessary condition for 
any decimation procedure to be effective is, for each non pathological sample, a strong 
positive correlation between magnetization vectors m BP and m EX , iV-dimensional 
vectors whose i-th component represents the magnetization of each variable Xi defined 
as 

m BP/EX = p BP/EX {1) _ p BP/EX {Q) (?) 

where Pf x (x) = Pf rue (x) = Af so i(xi = x)/Af so i. In order to assess this point, we define 
two global parameters testing respectively: 



Finite size corrections 



7 



N = 100M = 80X = 20% 
A> = 5» M = 40X = 20% 



60 
40 

20 - 

~ 
0.1 



0.12 0.13 

log(N, ol )/N 



350 
300 - 
250 
200 
150 
100 
50 - 
- 



N = 100M = 60X = 20% 
N = 50 M = 30X = 20% 



0.26 0.27 



400 
350 
300 
250 
200 
150 
100 
50 




N = 100M=80X = 80% 
N = 50 M = 40 X = 80% 



0.12 0.13 0.14 

Iog(N sol )/N 




0.26 0.27 0.28 



Figure 2. Histograms of the exhaustive algorithm estimates measured on 10000 
samples of N = 50 (red boxes) and N = 100 (blue boxes) for four different choices 
of a e {0.6,0.8} and X € {20%, 80%} expressed as a percentage. The black arrow 
is the numerical estimate using BP, that agrees perfectly with the theoretical value 
a = (1 - a) In 2. 
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Figure 3. Histogram of cos(9bp-ex ) compared with the reference histogram of 
cos(6ex-tt(ex)) (l e ft panel). In the right panel we display the histogram of both 
d-BP-EX and d EX —K(EX) ■ Measurement are done at N = 100 M = 90 and a percentage 
of XOR functions of X = 20%, 60% over an ensemble of 10 4 samples. 
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• the probability distribution of the relative angular overlap of the two magnetization 
vectors in the N dimensional space, and 

• the probability distribution of relative magnetization euclidean distances. 

For each sample, the first overlap parameter is defined cos(8bp), i-e as the cosine of the 
angle between the exhaustive and BP magnetization vectors: 

rh BP ■ rh EX 

COs(e BP _ EX ) EE |mgp| j^j • (8) 

The second is defined as the euclidean distance between non normalized magnetizations: 

1 



dBP-EX — ~jy 



N 



£ (mf p - mf x f (9) 



In both cases the predictions have been tested against a null hypothesis. In the null 
hypothesis, random magnetization vectors are extracted in the following way: for 
each sample m n( - EXS> is a random permutation of the components of m EX , so that 
m n{EX) ^ m EX ^ where 7r(i) is a random permutation of the ordered set {1, . . . , N}. 
The quantities cos(9ex~-k(ex)) and oIex--k(ex) are then calculated substituting m n( ~ EX ' ) 
to m BP in eqs.flH]) and (Q. Distributions of the overlaps over the sample populations 
are plotted in figure [31 The results show a strong correlation between the true and the 
predicted magnetization vector distributions. 



4-3. Solutions overlap 

It has been indicated in PHI tH] that in this model the entropy is analytic in all the phase 
diagram, while the organization of the fixed points undergoes a sudden reorganization 
at some ad(X): 

• At a < otd{X) all solutions are in a single cluster, i.e. any pair of solution is 
connected by a path via other solutions, where in each step only a finite number of 
variables can be changed. 

• At a > ad{X) The space of solutions spontaneously breaks into an exponential 
number of macroscopically separated clusters of fixed points. Their number, or 
more precisely its normalized logarithm, is called complexity. It is a first-order 
phase transition. 

This behavior is characterized by the appearance of a non-trivial structure of the 
space of solutions. In other words the fixed points, rather than being uniformly scattered 
over the A^-dimensional hypercube, start to organize themselves in clusters, with a well 
defined intra-cluster and inter-cluster overlap. Nevertheless, by means of the exhaustive 
enumeration technique introduced above, one can easily write down all the solutions for 
a given sample and hence calculate all the M so \{M so \ — 1) /2 overlaps, defined in a standard 
way as q ab = 4 J2iLi a t a \- Note that we are using spin variables here cr, = — 1 + 2xj, 
and a, b indicates two distinct solutions. The distribution of the overlaps for two sizes 
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and three different choices of X are shown in figure HI We choose a value far apart form 
the clustering transition line (no XOR functions), a value in the vicinity of the transition 
line (50% of XOR functions) and a value deep inside the clustered phase (100% of XOR 
functions). 




Figure 4. Distribution of the overlaps for 10000 samples at a = 0.93. X = (left 
panel), X = 0.5 (central panel), X = 1 (right panel), for N = 50 (thin line) and 
TV = 100 (thick line). 



It is clear that as one moves from the unclustered to the clustered phase (which 
at fixed a is equivalent to increasing X), the distribution changes from a broad shape 
to a two-peaked one. However, this two-peaked pattern is just a finite size effect, as it 
seems to be indicated by the reduction of the weight of the right peak of the P(q) from 
N = 50 to N = 100 shown in the left panel of figure HI As the number of clusters is 
exponential in the system size, the probability of extracting two random solutions from 
the same cluster is negligibly small even for moderate sizes. 

This phenomenon can be understood by the following qualitative argument: let 
us suppose that the number of clusters as well as the size of each single cluster 
are exponential in the system size. Let us further suppose that the distribution 
of cluster sizes is strongly peaked around a single value, A4oi-in-ci (l ess stringent 
conditions can be found, together with pathological cases where those conditions do 
not hold, but a complete treatment would go beyond the scope of present work). In 
a completely clustered phase the weight contribution of the intra-cluster overlap will 
then be proportional to A / " c ijV2 )1 _ ill _ cl where jV c i is the number of clusters. On the other 
hand, the contribution to the inter-cluster overlap is proportional to A r jA / ^ )1 _ in _ cl (i.e. 
all couple of solutions except those in the same clusters). If A/" c i is exponential in N, the 
intra-cluster contribution will be negligible with respect to the inter-cluster one in the 
thermodynamic limit. 

5. Some considerations on the validity of the annealed approximation 

In this section we will show that the logarithm of the number of solutions (i.e. the 
entropy) of our model of BNs computed within the annealed approximation scheme 
agrees with the numerical estimate for the entropy we performed by exhaustive 
enumeration and by the numerical solution of the BP equations on single sample. 
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This feature allows us to make some conjectures, and check whether the annealed 
approximation is exact in the iV — > oo limit. We will show that the variance of P(Af S oi) 
as well as all higher moments are proportional to 2^ 1 ~ a ' N in the large N limit, with 
a proportionality constant depending only on a and X. This implies that a non zero 
contribution to the entropy is given at the leading order only by the external regulating 
variables, in accordance with results obtained in [TQl [11] . However, we will also show 
how, in the general case of X < 1, the proportionality constants are different from one 
and greater than it, not ensuring the exactness of the annealed result, albeit replica 
results given in [101 [11] £1X6 £L strong hint in that direction. 

Moreover, we will see that in the thermodynamic limit: 

j^S = C(a,X) (10) 

where C(a,X) is a constant independent of N. Only in the case of pure XOR classes 
(X = 1), it is possible to show that the proportionality constant is exactly 1 the 
large N limit for any order moment and any value of a, implying that the annealed 
approximation is exact. 

Let's consider a distribution of positive random variables Z,. As we are interested in 
the relation between the quenched entropy and the annealed approximation, we want to 
investigate the relation between (InZ) and ln(Z), where ( ) identifies here the average 
over the distribution of Zj. We can write: 



(InZ) = \n(Z) + (In—) = ln(Z) + g H~ } (U) 

where we have expressed the quenched entropy in term of the annealed one plus a series 
of the moments of the distribution. In our case we take Z = M so i- 111 cases where the 
sum of the series of moments is finite or for any X and a diverges as a function f(N) 
of the size of the graph such that f(N)/N — > 0, then the quenched and the annealed 
entropies coincide. We will show that this is the case at least for the second order 
moment. It is important to state, however, that the previous series expansion is valid 
only if < Zj (Z) < 2, and that averaging the resulting series term by term is possible 
only if the sum can be taken out of the integral which defines the average. 

These conditions are not always met in our model, and in particular we expect the 
existence of a certain range of X and a values beyond which the conditions are not 
necessarily satisfied. In particular we will also show that in the thermodynamic limit 

lim \im C(a,X) = oo (12) 

However this will not necessarily undermine the validity of the annealed calculation. 

Under the hypothesis of equation ( flOl) . and thanks to a straightforward 
implementation of the Tschebichev inequality, one can easily show that, (see 



Appendix A.2[) 



Pr(N sol > (Msoi)) < 2- N ^C(a,X) (13) 
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where 7 is a positive constant. This implies that, in the thermodynamic limit, the 
probability distribution of the number of solutions has support in the interval (0, (N so i))- 
Unfortunately in order to take under control also the left tail of the distribution one 
should compute moments of the type (A/^) with < n < 1, which is, as indicated in 



Appendix A. 3, a rather complicated task. 



5.1. General calculation of (N soi) 

Given a probability distribution of classes of Boolean functions 7r(/), drawn 
independently, where / is a generic Boolean function of K = 2 inputs, and given a 
value of a G [0, 1] one can write 

M~aiV 

.III ■ X2, m )))n(f))g (14) 

x m=l 

where the external average is over the graph ensemble, while the internal one is 
over 7r(/). Note that for the XOR class f(xi,X2) = 6X1X2, while for the AND-OR 
class f(xi,X2) = £0/(2(61X162X2 + €1X1 + e 2 £2 — 1)), with e, G {±1} with a chosen 
probability and X G [0, 1]. The variables {x{\ G {±1}^. Averaging over the values 
of {ej}j = o,i,2 and X is therefore equivalent of averaging over the 7r(/). In particular, in 
the case of flat classes distribution, which is the object of the present work, one has 
prob(e) = (5(e - 1) + 5(e + l))/2. Therefore: 

M~aiV 

.in ■ X2,m))){ei},x)g (15) 

x m=l 

Let us now define N Q = aN regulated variables and Ni — (1 — a)N external inputs 
(including the (1 — a)e~ 2a isolated nodes). Assuming the input variables are extracted 
randomly and independently on each of the N clauses; one can write 

N+,N+=0 V ° J V 1 



[P(+ + \+)g(+ + +) + v(+ - \+)g(+ - +)+ 



A', 



+ 



n- + \+)g(- + +)+ n- - \+)g(- - +)] 
\n+ + \-)g(+ + -) + n+ - \-)g(+ - -)+ 

n- + i-M- + -) + n- - \-)g( — )] Mo+ (is) 

where V(pa\r) is the probability of drawing two inputs of sign p and a given the fact 
that one is looking at a clause with output variable sign r, and g(par) is the value of 
the function node 8(1; r/(p, a)) times the probability of extracting a certain Boolean 
function type /. In the case of uniform ir(f), V(pa\r)g(par) is trivially 1/8 V signs 
triplet, leading to (M so i) = 2^~ a ^ N identically. 

For a different distribution this is in general not the case. As a title of example, 
in the case of a mixture of pure XOR and AND functions, without any literal negation, 
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tt(/) = XS(f; U) + (1 - f A ) and eq.® reads 

N„,Ni 

<au = E 

N+,N?=0 



N+ 



iV + 



2X(iV-A^+)(A^- 



r 



x 



(N — 1)(N — 2) 
iV+(iV+ - 1) 
(iV- l)(iV-2) ^ _ 

N + (N-N + 



[1-X) 



(N+ -1)(N+ -2) 



(TV — l)(iV — 2) 
(N-N+ - l)(iV-iV + -2) 
(TV - l)(/V-2) 

1 JV -AT+ 



+ 2(1 -X): 



;i7) 



(JV- l)(N-2) 

Leading order terms can be computed following a calculation identical to the one shown 
below for the second moment, and will be omitted here. 



5.2. General calculation of (J\f 2 ol ) 
For the second moment: 

M~aAT 

Woz) = II ( 6 ( 1 ^ x o,mf(xi, m ,x 2 , m ))S(l-,yo, m f(yi, m ,y2, m ))){e l },x}g(l8) 

x,y m=l 

Averaging uniformly over the function types one obtains 

M~aJV 

W«n> = ^ II ^ (2) (^;^,x 1)m ,x 2)m )) 6 (19) 



m=l 



with 



g ( - 2) (X;x ,x 1 ,x 2 ) 



X \ 1- 

y(l + X XiX 2 )H 



— ((l + x 1 +x 2 + x 1 x 2 )^ + l)(20) 



iV c 

iV+ 



iV 
iV+ 



Averages over different function types distributions are also possible, but beyond the 
scope of this work. Along the same line of arguments of the previous section, averaging 
over the Poissonian graph structure, one obtains 

N„,Ni 

<A£,>= £ 

N+,N+=0 

[V(+ + \+)g(X; + + +)+V(+ 

V(- + \+)g(X;- + +)+V(-- 

[V{+ + \-)g{X- + + -)+V{+ 

V{- + \-)g(X;- + -)+V(-- 

where g(X; par) is now the value of given X and the sign of the inputs. Equations 
(TTE|) and f[2"Tj) have an identical structure. This observation can be generalized and holds 
for any higher order momentum, changing the structure of the functions g. The details 
of the computation of the second moment are reported in Appendix A At the leading 
order it turns out that: 

(A/1) = J (2) (a, X) = 2 2 ^ N C(a, X) (22) 



\+)g(X;+-+)+ 
\+)g(X;--+)} N ° 
\-)g(X; + --)+ 
\-)g(X;- -)]»•-*# (21) 
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The value of the multiplicative constant C(a,X) as a function of X for increasing 
values of a is plotted in fig.©. Note that, unless in the pure XOR case, C(a,X) ^ 1. 




Figure 5. Plot of C(a,X) for the second moment of the distribution of the number 
of solutions in the large N limit. The red uppermost line is the function e^^^/x, 
diverging for x — » 



This means that the analysis of the second order momentum is not enough to assess 
the theoretical validity of the annealed calculation of the entropy in the sense that 
the convergence of the logarithmic correction series is not ensured a priori. Moreover, 
whenever C(a,X) > 1, afa ol = (A/^) - (Koi) 2 oc 2 N ^~ a \ 



5.2.1. The special case of a = 1 At a = 1 several simplifications in the computation of 
the second moment hold. Both the entropic contribution due to the external regulators, 
and the Kullback-Leibler terms are identically zero at the saddle point, as one can check 
from the saddle point equation for (A/" s 2 oZ ) presented in Appendix A It turns out that the 
contribution to the saddle point includes also the term corresponding to the non-zero 
boundary term of integration. For the second moment in particular, one has to take 
into account the contribution of all terms around iV+ = N Q . 

Going back to eq. fl2TT) and taking explicitly into account those terms, one obtains 
in the large N limit 

m^'-ft^ r-"" 1 '-; 1 '^"' (23, 

t=o z - 

where 1^(1, X) = e^' _1 ^/x. The values of both contributions diverge for X — > 0. 
Analytic estimates suggest that the divergence goes as y/N. In fig. [6] we display the 
numerical estimate of {NJ i) obtained via exhaustive enumeration and the analytic 
estimate presented in eq. fl21~|) . The agreement is good, as it should be, since the 
computation of the second moment is exact, also for finite N. 
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Figure 6. Second moment of the number of solutions distribution. Bars arc computed 
by exhaustively counting the solutions for 100000 samples at a = 1, crosses by 
evaluating the analytical formula with Mathcmatica. 



6. Conclusions 

In this work we have investigated several aspects of the fixed points solutions set of a 
particular class of finite size diluted random Boolean networks with K = 2 inputs per 
function. A search algorithm based on a state-of-the-art satisfiability solver was used to 
exhaustively enumerate fixed points up to moderate system sizes. For fixed system 
size ensembles the average number of solutions were plotted together with average 
fluctuations and with two additionally ad hoc defined order parameters indicating 
average distance between solutions within the fixed points set. A throughout comparison 
of the exact results with those given by the heuristic Belief Propagation algorithm was 
done in order to assess the performance of BP for small size samples whose network 
structure significantly deviates from tree-like. BP was shown to perform significantly 
well in the prediction of the correct number of solutions even in small sizes cases. 
BP seems to loose its predicting power in the calculation of single Boolean variables 
marginals at the fixed points; still it was shown to retain a significant correlation 
with exact results in the global spatial arrangements of the solutions. Furthermore, 
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the analytical results closing the paper, together with their agreement with exact 
enumeration results, give a strong hint on the exactness of the annealed calculation 
of the entropy as well as the high order moments of the probability distribution of the 
number of solutions. 
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Appendix A. The computation of the moments of P{M so i) 

We first present the details of the computations of the second moment and we will then 
give some hints on the structure of the generic n th moment. 



Appendix A.l. The second order momentum 

With respect to (Ng ol ), if we keep only the leading order terms in eq. (l2TT) . the sums can 
be approximated by the following integral, where the exponent is given by the entropy 
contribution of the external variables plus a Kullback-Leibler distance term vanishing 
at the saddle point: 



jo jo 

K(a, X, 6+ 6+) = , >/"(!-") = ^(q^) 

° 1 VW(i 

F[a, X, 6+ b+] = (1 - a)H(bf) + aD KL (b+\G{a, X, 6+ &+)) 
B , x h+ h+] „ _ 3fe + fe + + (1 - X)(l - 6+)6+ + (1 + X)(l - b + )bt/2 

1 ' '°°' 0i) 6 (5+)2 + ( 1 _ X )(l-6+)6+ + (l+X)(l-6+) 2 /2 

(1 + X)(b+(1 - b+) + 3(1 - X)(l - 6+)(l - 6+)/2 

i - (b+y + (i - x)(i - b+)b+ - (i + x)(\ - b+y/2 

H(x) = — x log(x) — (1 — x) log(l — x) 

d K l{x\v) = xl °g (|) + (! - x ) lo g (infj 

G(a, X, 6+) = {b + f + (1 - X)b + {\ - b + ) + (1 - b + ) 2 

b+(a,b+,b+) =ab+ + (l-a)bt 

where 6+ = N+/N and 6+ = N+/N Q . The value of the Integral J (2) (a,X) can be 
calculated at the saddle point 

bf = I (A.l) 
K = G(a,XX,bt) (A.2) 
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2 - a(l - a) + aX(l + 3a) - 2y/l - a(l - X) + a 2 X(X - 1) 
" 2ct 2 (l + 3X) 

Finite N corrections can be in principle computed extending the calculation of K to 
higher orders in 0(1/N), and performing an asymptotic expansion around the saddle 
point. Whenever a < 1 and X > it can be seen that the condition fl A. 3j) finds the 
only maximum J 7 , which lies within the integration interval. 

Appendix A. 2. An upper bound on the number of solutions 

In this subsection we will show that in the thermodynamic limit the support of P(M so i) 
is contained in the interval (0, (N so i)), using, as we have already shown in the previous 
section, that: 

{■Afsoi) = 2 (1 - Q)JV 

(N* ol ) = 2^ N C(a,X) (A.3) 

where C(a,X) is a constant independent from N. The one-tailed Chebyshev inequality 
states that: 

Pr (J\f so i > (Koi)) < m ^% y~ 2 ■ (A.4) 

Under the condition N so i > (Nsol) we can express M so i = 2 7VS ' where S' > (1 — a). 
Inserting eqs. flA.31) into eq. flA.4j) we get: 

Pr(M- > (K ol )) < ^^^L)) ^ ^-^C(a,X) 

= 2~ N ^C(a,X) , (A.5) 

where 7 is a positive constant making the right tail of the P(ft/ so i) distribution (i.e. 
for values of M so i > (■Nsol)) exponentially small in N, as we wanted to show. Indeed, 
this simple result is enough to imply that no contribution to the entropy is given by 
instances whose number of solutions is larger than the annealed value. The support of 
the probability distribution of entropy values P(S) must be therefore [0, S annea i e d] ■ In 
order to prove that also smaller values do not take part of the support, one would need 
to calculate fractional order moments, as explained in the end of next section. 

Appendix A.3. Higher order moments 

For the general n th order momentum one can write, along the same line of reasoning: 

N ° Ni AM AM 

(Koi) = E E 



Y\ N* 71 '" <Tn ~ 1 1 _A/" CT1 " 
5(N ; aN)5(N f , (1 - a)N) J] T(&) ■ 

(7 

s(N -j2K i --- an - i nNi-,Y, N i 1 ''' an ~ 1 ) ( A - 6 ) 

(A.7) 



{7V o CT1 "' CT "- 1 }=0 {7V CT1 "' CT "- 1 }=0 
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with 



T{a) 



J 1 ) J 2 ) 



,(!) ~( 2 ) 



(A. 



/ (1) (2) 



^(2) 



, cr n I 1 cr n I 1 |(7i, <7 n _i) 



iV,-, 



where we are summing over all configurations overlaps with N^ 1 '"**™- 1 /N[ 1 "' a ' n ~ 1 
output/input variables of signs 0"i...cr n _i, the P represent the probability of finding 
n — 1 real replicas of an input variables couple in a given function, and g the value of 
the (n — l) th averaged product of the replicated Boolean function. 
As before, from the leading terms of eq. (lA.7j) . one gets 

(Koi) ^I (n) (a,X) 

I^ n \a,X) = 2^ N f 1 H%}4y^(a ) I 1 {io 1 6 l })e^ ,WW ' 11 



with 



rV>[a,X, {KA}} = (1 - a)HV> + aD%l({b }\G^] 



D%l({b }\G( 



n) - 



0"i ...crn— 1 



log 



7 cri ...an— 1 
°0 



(A.9) 



with b?r an 1 = N%" ffn - l /N 0)i and {6 0)i } = {&S-" <Ta ~ 1 }*={±i}»- 1 . The explicit form 
of functions and .fT^ is not given here for brevity. Due to the general form of 
the exponent, the unique saddle point can always be computed as the one fulfilling 
conditions: 



in— 1 



~ h <n-° n -i = G<- n \a, X, {V?"**- 1 }, {b- 1 -** 1 - 1 }) (A.10) 

such that 

7 (n) (a, X) = 2 n ^ N C {n) (a, X) = {N sol ) n C {n) (a, X) (A. 11) 

Unfortunately, the explicit calculation of the function C^ n '(a,X) for all n seems not to 
be easily accessible. Let us just finally note that one could prove that the distribution 
of the logarithm of the number of solutions P(S) tends to S(S — Squealed) in the large 
iV one by showing that: 

(A.12) 

which seems reasonable given the structure of eqs. (\A.9\ IA.101 [A~TTT) . and the numerical 
result displayed in section |4~T1 but, again, too difficult to be shown explicitly. 



lim lim C (n) (a,X) = 1 

n-»0 jV-«-oo 
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